Efficient Algorithms for Masking and Finding Quasi-Identifiers

نویسندگان

Rajeev Motwani

Ying Xu

چکیده

A quasi-identifier refers to a subset of attributes that can uniquely identify most tuples in a table. Incautious publication of quasi-identifiers will lead to privacy leakage. In this paper we consider the problems of finding and masking quasi-identifiers. Both problems are provably hard with severe time and space requirements. We focus on designing efficient approximation algorithms for large data sets. We first propose two natural measures for quantifying quasi-identifiers: distinct ratio and separation ratio. We develop efficient algorithms that find small quasi-identifiers with provable size and separation/distinct ratio guarantees, with space and time requirements sublinear in the number of tuples. We also propose efficient algorithms for masking quasi-identifiers, where we use a random sampling technique to greatly reduce the space and time requirements, without much sacrifice in the quality of the results. Our algorithms for masking and finding quasi-identifiers naturally apply to stream databases. Extensive experimental results on real world data sets confirm efficiency and accuracy of our algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine

Just-in-time scheduling problem on a single batch processing machine is investigated in this research. Batch processing machines can process more than one job simultaneously and are widely used in semi-conductor industries. Due to the requirements of just-in-time strategy, minimization of total earliness and tardiness penalties is considered as the criterion. It is an acceptable criterion for b...

متن کامل

Finding the Shortest Hamiltonian Path for Iranian Cities Using Hybrid Simulated Annealing and Ant Colony Optimization Algorithms

The traveling salesman problem is a well-known and important combinatorial optimization problem. The goal of this problem is to find the shortest Hamiltonian path that visits each city in a given list exactly once and then returns to the starting city. In this paper, for the first time, the shortest Hamiltonian path is achieved for 1071 Iranian cities. For solving this large-scale problem, tw...

متن کامل

P-Sensitive K-Anonymity with Generalization Constraints

Numerous privacy models based on the k‐anonymity property and extending the k‐anonymity model have been introduced in the last few years in data privacy re‐ search: l‐diversity, p‐sensitive k‐anonymity, (α, k) – anonymity, t‐closeness, etc. While differing in their methods and quality of their results, they all focus first on masking the data, and then protecting the quality of the data as a wh...

متن کامل

I - 138 : Protecting Identifiers in Cross - Domain Environments

Unique identification of objects and their associated data representations have received significant attention in the past 10 years. Developing an efficient identifier allocation and tracking scheme that transparently spans security domains requires finesse. It is not uncommon for information to be created in a lower security domain and copied to a higher domain. The rigor by which the data is ...

متن کامل

Threshold Implementation as a Countermeasure against Power Analysis Attacks

One of the usual ways to find sensitive data or secret parameters of cryptographic devices is to use their physical leakages. Power analysis is one of the attacks which lay in such a model. In comparison with other types of side-channels, power analysis is so efficient and has a high success rate. So it is important to provide a countermeasure against it. Different types of countermeasures use ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Efficient Algorithms for Masking and Finding Quasi-Identifiers

نویسندگان

چکیده

منابع مشابه

Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine

Finding the Shortest Hamiltonian Path for Iranian Cities Using Hybrid Simulated Annealing and Ant Colony Optimization Algorithms

P-Sensitive K-Anonymity with Generalization Constraints

I - 138 : Protecting Identifiers in Cross - Domain Environments

Threshold Implementation as a Countermeasure against Power Analysis Attacks

عنوان ژورنال:

اشتراک گذاری